Systems and methods for use in blocking of robocall and scam call phone numbers

ABSTRACT

Telephone numbers that are associated with robocalls or scam calls can be automatically identified by a telephone network operator. The identified telephone numbers may be blocked from placing phone calls on the telephone network.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 17/560,555, filed on Dec. 23, 2021, which claims priority toU.S. Provisional Application No. 63/132,605, filed on Dec. 31, 2020, theentire contents of each of which is incorporated herein by reference forall purposes.

TECHNICAL FIELD

The current application relates to blocking calls and in particular toblocking calls placed from numbers used in Robocalls and/or scam calls.

BACKGROUND

Scam telephone calls and robocalls are becoming an increasing problem.There are solutions that can be used to block inbound calls from numbersknown to be used by robocallers and/or for scam calls. While thesesolutions may be useful they require an end user to install someapplication or use a device to provide the desired functionality.

It is difficult to adapt existing solutions from end-user devices to atelephone network level as it may be unacceptable for the telephonenetwork to block a number that was incorrectly identified as beingassociated with a robocall or scam call. Identifying telephone numbersassociated with robocalls or scam calls based on data available tonetwork operators can be a difficult task given the volume of dataneeded to process.

It would be desirable to have new, additional and/or improved tools foruse by telephone network operators in identifying and blocking telephonenumbers associated with making robocalls and/or scam calls.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features and advantages of the present disclosure will becomeapparent from the following detailed description taken in combinationwith the appended drawings, in which:

FIG. 1 depicts a system for identifying and blocking phone numbersassociated with robocalls and/or scam calls;

FIG. 2 depicts a user interface presenting identified phone numbers;

FIG. 3 depicts a method for identifying and blocking phone numbersassociated with robocalls and/or scam calls;

FIG. 4 depicts a method of pre-processing raw call log records;

FIG. 5 depicts a method for unblocking blocked phone numbers; and

FIG. 6 depicts a Precision-Recall curve.

DETAILED DESCRIPTION

In accordance with the present disclosure there is provided a system foruse in blocking phone numbers in a telephone network comprising: one ormore processors for executing instructions; and at least one memory forstoring instructions, which when executed by at least one of the one ormore processors configure the system to perform a method comprising:receiving from a plurality of telephone network elements a plurality ofraw call log records; periodically processing the received plurality ofraw call log records comprising: formatting each of the raw call logrecords into a corresponding call record having a common format; andidentifying raw call log records or call records associated with a samecall; and aggregating raw call log records or call records associatedwith the same call together; periodically processing the call logscomprising: processing the call logs using a first trained model toidentify phone numbers associated with anomalous call behaviour asanomalous phone numbers; and processing the call logs using a secondtrained model to identify phone numbers associated with a firstundesirable type of call behaviour as first undesirable call type phonenumbers; and blocking at least one phone number of the anomalous phonenumbers and the first undesirable call type phone numbers from makingcalls over the telephone network.

In an embodiment of the system, the first undesirable call type is aWangiri type scam call.

In an embodiment of the system, the at least one phone number that isblocked is further processed to ensure the number should be blockedprior to being blocked.

In an embodiment of the system, the method provided by executing theinstructions further comprises: automatically calling at least one ofthe phone numbers of the anomalous phone numbers and the firstundesirable call type phone numbers; and recording a portion of thecalls made automatically.

In an embodiment of the system, the method provided by executing theinstructions further comprises: generating a user interface including anindication of one or more of the anomalous phone numbers and the firstundesirable call type phone numbers; providing the generated userinterface to an investigator of the telephone network operator; andreceiving from the user interface a selection including the at least onephone number for blocking.

In an embodiment of the system, the generated user interface furtherincludes an indication of the recorded portion of the calls.

In an embodiment of the system, the method provided by executing theinstructions further comprises: retrieving additional information fromone or more sources on the anomalous phone numbers and the firstundesirable call type phone numbers; and including the additionalinformation in the generated user interface. blocking at least one phonenumber of the anomalous phone numbers and the first undesirable calltype phone numbers from making calls over the telephone network.

In an embodiment of the system, the method provided by executing theinstructions further comprises: unblocking blocked phone numbers.

In an embodiment of the system, unblocking blocked phone numberscomprises: identifying blocked phone numbers; and for each blocked phonenumber, determining if there has been no call activity over thetelephone network associated with the blocked phone number for athreshold period of days, and unblocking the blocked phone number whenit is determined that the has been no call activity for the thresholdperiod of days.

In accordance with the present disclosure there is further provided amethod for use in detecting fraudulent phone numbers associated withundesirable behavior in a telephone network.

In accordance with the present disclosure there is further provided asystem for detecting fraudulent phone numbers associated withundesirable behavior in a telephone network.

Undesirable phone calls can be a problem for consumers. These calls mayinclude various types of scams or other undesirable calls. For example,some calls may impersonate a revenue agency such as the Canadian RevenueAgency (CRA) or the Internal Revenue Service (IRS) and have the victimtransfer money or other payments to the perpetrator. Other types of scamcalls may include Wangiri, or “one ring” calls in which a scammer callsa target from a phone number and hangs up after one or two rings, orjust long enough to register as a missed call. This process may berepeated from the same or slightly different phone number. If the targetcalls back the phone number, for example out of curiosity, the returnnumber may be for a “pay to call” or premium rate number causing thetarget to pay these charges. These types of scam calls may be made byrobocalls, or may use robocalls to identify possible phone numbers thatare active. As described further below, a telephone network operator maycollect and process call data from their telephone network in order toidentify phone numbers associated with the undesirable behaviours. Oncesuch phone numbers are identified, they may be blocked from makingand/or receiving calls on the telephone network operator's network.

FIG. 1 depicts a system for identifying and blocking phone numbersassociated with robocalls and/or scam calls. The system 100 can beimplemented by an operator of a telephone network 102, which may includedifferent telephony technologies including for example, Voice over IP(VoIP), cellular, and landline or SS7. Regardless of the particular typeor composition of telephone network, it will comprise a plurality ofnetwork elements 104 a, 104 b, 104 c (referred to collectively asnetwork elements 104) for completing telephone calls. The networkelements 104 may connect the telephone network 102 to consumer (or enduser) equipment such as telephones 106 a, 106 b, 106 c (referred tocollectively as telephones 106) as well as to other telephone networks108 or other telephony equipment. Each of the network elements 104 maygenerate logs for each call, or attempted call, handled by the networkelements 104. The logs may include various information about the callsuch as the telephone number of the party being called (called party ordestination number), the telephone number of the party calling (callingparty or source number), the time the call was placed, if the call wasanswered, if the call was answered by a voice message system, ageographical location of the party calling, a geographic location of theparty being called, as well as other possible information such asidentifying information about the device of the caller/callee devices.As described in further detail below, the log information collected forcalls may be processed to identify and block phone numbers associatedwith undesirable behaviour.

The processing of the data collected from the various network elements104 may be performed by one or more servers 110. The server(s) 110comprises one or more processing units 112 for executing instructionsand memory units 114 for storing instructions which when executed by theprocessing units 112 configure the server(s) 110 to providefunctionality for identifying and blocking phone numbers associated withundesirable behaviour. The server(s) 110 may also include non-volatile(NV) storage 116 as well as one or more input/output (I/O) interfaces118 for connecting internal and/or external components, devices and/orperipherals to the server(s) 110.

The functionality 120, which is provided by executing the instructionsstored in the memory, includes data collection functionality 122 forprocessing the data collected by the network elements 104, detectionfunctionality 124 for detecting, or rather identifying, phone numbersassociated with undesirable behaviour, action functionality 126 forblocking and unblocking phone numbers, investigative interfacefunctionality 128 for providing an interface to investigators of thetelephone network operator, as well as additionally investigativeprocessing functionality 130.

Broadly, the data collected by the network elements 104 is pre-processedby the data collection functionality 122 and the pre-processed data isused by the detection functionality 124 to identify phone numbersassociated with undesirable call behaviour. The identified phone numbersassociated with undesirable call behaviour can be blocked/unblocked orother actions may be taken by the action functionality 126. The actionsmay be taken automatically, or may be taken based on additional user(e.g. network operator level) input. In a non-limiting example, theadditional user input may be provided by an investigator using aninterface provided by the investigative interface functionality 128. Theinvestigative interface functionality 128 may also use or solicitadditional information that may be useful to the investigator andprovided by the investigative processing functionality 130.

As described above, the data collection functionality 122 pre-processesdata collected by the network elements 104. The raw call log data may bestored or accessed in numerous different ways, which are depictedschematically as a database 132 in FIG. 1 . The raw call data logrecords from the network elements 104 are processed by logpre-processing functionality 134 to generate processed call records 136.The pre-processing may include minor processing such as cleaning andstandardization of records for ensuring dates and times of recordsprovided from different network elements, and thus possibly in differentformats, are in the same format, as well as more major processing. Forexample, the processing may include identifying and aggregating raw callrecords, and/or possibly previously processed call records, that areassociated with the same call. Aggregating call records associated withthe same call can be achieved in various ways. For example, the recordsmay be aggregated together into a single aggregate call record.Additionally or alternatively, the call records associated with the samecall may be labeled with a unique call identifier to allow aggregatedrecords to be quickly identified. Additionally or alternatively, arecord or other indicator can be provided that identifies all of therelated call records that are associated with the same call. In additionto the unique call identification, the processing may further includecomputing or determining any metrics or features used in the anomalyand/or scam detection.

The raw call data logs may be periodically processed in relatively shortperiods. For example, the raw call data logs may be processed every 5minutes. Alternatively, this processing may be done in longer or shorterintervals, or possibly in real time. Regardless of the time intervals ofprocessing the raw call data logs, once the records are processed by thelog pre-processing functionality 134 the resulting call records 136 canbe stored for subsequent processing by the detection functionality 124.

The detection functionality 124 may comprise various differentfunctionality for processing the call records 136 to identify phonenumbers associated with undesirable behaviour. As depicted in FIG. 1 thedetection functionality may include general anomaly detectionfunctionality 138 that detects anomalous behaviour in call patterns. Thephone numbers that are identified by the general anomaly detectionfunctionality 138 may be associated with behaviours that are out of thenormal, although may not require being blocked. In a non-limitingexample, the anomalous phone numbers identified by the general anomalydetection functionality 138 may be presented to investigators which mayhelp speed the identification of additional scams or undesirable callbehaviour. The general anomaly detection may be done in various waysusing algorithms or techniques for identifying anomalies. In addition tothe general anomaly detection functionality 138, the detectionfunctionality 124 may further include specialized detection models 140that detect specific undesirable call behaviour. For example, thespecific detection models 140 may include a Wangiri fraud detectionmodel 142 that detects phone numbers, and in particular caller phonenumbers, associated with Wangiri fraud calls. Additional detectionmodels 144 may include models trained to detect other specific types ofpossibly undesirable call behaviour, such as revenue service call fraud,Microsoft™ support scam, etc.

Each of the detection functionalities 138, 142, 144 may label orotherwise provide some other indication of the phone numbers that weredetected by the various functionalities as possibly being associatedwith undesirable call behaviour. That is, for example, the generalanomaly detection functionality 138 may provide an indication of one ormore phone numbers that were determined to be anomalous, the Wangiridetection functionality 142 may provide an indication of one or morephone numbers that were determined to be associated with Wangiri fraudcalls, etc. Details of illustrative implementation of both the generalanomaly detection functionality 138 and the Wangiri detectionfunctionality 142 are described in further detail below.

Once one or more phone numbers have been identified by the detectionfunctionality 124, one or more actions may be taken on the phone numbersby action functionality 126. The actions may be taken automatically, ormay be taken after some form of user interaction, for example by aninvestigator of the network operator. For example, an anomalous phonenumber may not be blocked automatically, but may be marked for blockingafter an investigation or further review by an investigator. Asdepicted, the action functionality 126 may include phone number blockingfunctionality 146 and phone number unblocking functionality 148.

Depending upon how phone numbers are marked for blocking as well as thelevel of acceptability of potentially blocking a valid phone number, theblocking functionality 146 may, in a non-limiting example, simplyautomatically block all provided or marked phone numbers. Alternatively,the blocking functionality may include one or more checks or businessrules that are applied to the phone numbers marked for blocking and onlythose phone numbers passing all of the checks may be blocked.

The phone numbers identified by the detection functionality 124 may beautomatically passed to the phone number blocking functionality 146, orthey may first be passed to investigative interface functionality 128for generating an interface for use by an investigator. Theinvestigative interface functionality 128 may include a graphical userinterface (GUI) generation functionality 150 that generates aninvestigative interface that may present the identified telephonenumbers to an investigator, which may allow the investigator todetermine whether or not the phone number(s) should be blocked or not.The GUI that is generated may include an indication, such as a button orother GUI element, that allows the investigator to select a phone numberfor subsequent blocking by the phone number blocking functionality 146.In addition to providing an indication of one or more of the phonenumbers identified by the detection functionality 124, the GUI mayfurther include additional information that may be helpful to aninvestigator in determining whether to block a phone number or not.

In order to provide the additional information, the investigativeinterface functionality 128 may include data collection functionality152 for retrieving or accessing the additional information presented inthe generated GUI. The data collection functionality 152 may retrieveinformation from various sources. For example, the data collectionfunctionality may retrieve information from one or more subscriber datasources of the telephone network operator to retrieve informationassociated with phone numbers that are provided by the telephone networkoperator. Additionally, the data collection functionality 152 mayretrieve information from other sources such as provided by theinvestigative processing functionality 130.

The investigative processing functionality 130 may include one or moredifferent functionalities or elements for providing additional relevantinformation. For example, the investigative processing functionality mayinclude honey pot number functionality 154 that provides a honey potphone number that is not used for other purposes and as such any numberscalling the honey pot phone number may be considered anomalous orpresenting undesirable behaviour. Additionally, the investigativeprocessing functionality 130 may include automated call-backfunctionality that can call back identified phone numbers, including forexample suspicious numbers or those potentially associated withundesirable behaviours, and record the phone call. The automated callback functionality 156 may simulate a call. Additionally, theinvestigative processing functionality 130 may include 3^(rd) party datacollection functionality 158 that can retrieve or access informationfrom 3^(rd) party sources such as yellow-page information or 3^(rd)party sources collecting information about robocalls or possiblefraudulent calls.

FIG. 2 depicts a portion of an illustrative user interface. The GUI 200may include, for example an area 202 indicating the phone numbers 202 aswell as an area with the predictions 204 for each of the numbers, suchas either being a relatively certain Wangiri, or a Wangiri that requiresmanual review. The GUI may also include an area 206 that enables a userto provide their own categorization of the call, as well as another areashowing other information such as a recording of the call 208. It willbe appreciated that other GUIs and/or layouts are possible.

Returning to the general anomaly detection functionality 138 depicted inFIG. 1 , the functionality 138 may use an Isolation Forest approach fordetecting anomalies. The anomalies may be detected over various timeperiods such as hours, days, weeks, etc. As will be appreciated by thoseskilled in the art, the Isolation Forest algorithm is an unsupervisedvariant of the Random Forest algorithm, which ensembles multiple weakpredictors, aka trees. In a non-limiting example, the features used bythe Isolation Forest model may include. among others, for example:

-   -   num_incoming_calls which is the number of unique incoming calls;    -   num_outgoing_calls which is the number of unique outgoing calls;    -   incoming_call_rate which is        num_incoming_calls/num_outgoing_calls;    -   call_duration which is how long a conversation lasts for a given        call record;    -   num_callees which is the number of unique callees; and    -   inter_start_time which is the start time of a call.

The Isolation Forest model, tuned using features including thosementioned above, may assign an anomaly score to each number, ororiginating phone number, which may also be known as calling party orcaller. Experiments have shown that the more anomalous the behavior of aparticular anumber is, as defined by the features including thosementioned above, the more likely it is to be assigned a higher anomalyscore by the Isolation Forest algorithm, as compared to anumbers thatdemonstrate “normal” behavior. It will be appreciated that in order toevaluate the detection performance of the Isolation Forest model, one ormore sources of verified anomalous phone numbers may be used. Forexample, the sources used may, for example, be Yellow Pages and/orNomorobo or similar other sources, which are relatively less biasedsources of information due to their crowd-sourced nature.

During performance tuning using Yellow Pages sourced data, it was foundthat the naïve Isolation Forest model did not result in acceptableaccuracy when evaluated by the Yellow Pages reported rate (as definedbelow). By performing experiments, however, it was found that theaddition of a filtering step that eliminated all anumbers with outgoingcalls less than a threshold improved the accuracy by approximately 30%when compared to the baseline. Note that the filtering step was not usedwhen evaluating the model using Nomorobo data but still achievedacceptable accuracy.

As a result of the performance tuning experiments, it will beappreciated that the general anomaly detection functionality 138 may, ina non-limiting example, include two Isolation Forest models: (1) thenaive Isolation Forest model that detects anomalies that are likely tobe also reported by Nomorobo, and (2) the filter controlled IsolationForest model that detects anomalies that are likely to be also reportedby Yellow Pages.

Varied measures may be used to evaluate the performance of the twoIsolation Forest models. To evaluate the filter controlled model usingYellow Pages sourced data, one measure that may be used is the YellowPages reported rate Y which, for an anumber a that is flagged by themodel and is also reported on Yellow Pages, is given by:

Y = [∑YP_(reported)]/N${{where}{{YP}_{reported}(a)}} = {{{1{if}\left( \frac{\left( {{{YP}_{scammer}(a)} + {{YP}_{debt}(a)}} \right.}{{YP}_{total}(a)} \right)} > 0.5};}$0otherwise.

And N is the total number of anomalies detected by the model.

To evaluate the naïve model using Nomorobo sourced data, one measurethat may be used for example is the Nomorobo reported rate ϕ, which, foran anumber a that is flagged by the model, is given by:

Φ = [∑Nomorobo_(reported)(a)]/NwhereNomorobo_(reported)(a) = 1ifthenumberisfoundreportedasarobocallerinNomorobo;0otherwise.

And N is the total number of anomalies detected by the model.

The Yellow Pages reported rate indicates how many anumbers out of thedetected anomalies are reported as scammers or debt collectors on YellowPages, while the Nomorobo reported rate indicates how many anumbers outof the detected anomalies are reported as robocallers in Nomorobo.

The execution time for performing a grid search in order to tune theparameters of the Isolation Forest models was found to be prohibitive.Therefore, random search was performed instead, using the popular Pythonlibrary scikit-learn. For the model tuned using Nomorobo sourced data,with 3 fold cross-validation, the resulting accuracy in terms ofNomorobo reported rate was 48.4%.

For the filter controlled Isolation Forest model, an iterative gridsearch was performed. With 3 fold cross-validation, the resultingaccuracy in terms of Yellow Pages reported rate was 60%.

The Precision score was evaluated in a real run, and was calculated bydividing the total number of distinct anumbers that were reported ineither Nomorobo or in Yellow Pages by the total number of all anumbersdetected as anomalies. The best Precision score observed was 73.87%, onJan. 3, 2019. During business days, the Precision score is usuallyobserved to be around 60%, while on Sundays, it is usually observed tobe less than 40%.

The above has described the anomaly detection as attempting to detectrobocalls and/or debt collector/telemarketer calls. It will beappreciated that other anomalous behaviours may also be detected. Forexample, profile based, or caller behavior based, anomaly detection ispossible. In profile based anomaly detection, for example, one mightfirst establish a profile for each caller in the data. The profile maybe established by looking at all available call history for each caller,or a subset of it. By analyzing these profiles, it is possible to findunusually deviant behavior, such as sudden spikes/drops in number ofcalls, sudden increase in calls to a specific destination number, etc.This may help, for example, in detecting spoofed numbers. To build upeach caller's profile, time series analysis can be used, morespecifically, moving average of each attribute, matrix profiling todiscover motif pattern of spammers and hence the abnormality detection.

Returning to the Wangiri detection functionality 142 depicted in FIG. 1, the detection may be provided in various ways. For example, a simpleapproach may involve using handcrafted rules/heuristics, using knowledgeof the scam characteristics. This may not, however, be the best approachbecause it typically leads to a proliferation of rules over time,exceptions to the rules and so on. Additionally, any rules may have tobe frequently tuned manually to account for changes in scammerbehaviour. Further still, the developed approach may not be easilyapplicable to other kinds of scams, potentially necessitating thedevelopment of a highly tailored solution for each type of scam.

A machine learning approach may be used to automatically “learn” thecharacteristics of a particular scam by using labelled examples of thescam. Such an approach can semi-automatically tune itself over time toaccount for changes in input data, representing, in this case, scammerbehavior.

In a non-limiting example, in order to mathematically model thebehaviour of Wangiri scammers, the following features may be used, whichcan be prepared or derived from the call logs.

-   -   1. dt_from: The lower bound of the time interval within which        the Wangiri detection was performed    -   2. dt_to: The upper bound of the time interval within which the        Wangiri detection was performed    -   3. anumber: The calling party's number, for which call records        are summarized and all the metrics below are computed    -   4. num_outgoing_calls: The number of outgoing calls from the        anumber    -   5. num_incoming_calls: The number of incoming calls to the        anumber    -   6. incoming_call_rate: The proportion of incoming calls,        relative to outgoing calls. This is computed as        num_incoming_calls/num_outgoing_calls    -   7. num_callees: The number of unique destination numbers called        by this anumber    -   8. callee_rate: The proportion of unique callees, relative to        outgoing calls. This is computed as        num_callees/num_outgoing_calls    -   9. inter_arrival_time_mean: The average of the inter-arrival        time between calls. The inter-arrival time is the interval of        time between two successive calls. Note: This is measured in        minutes.    -   10. inter_arrival_time_stddev: The standard deviation of the        inter-arrival time between calls, measured in minutes    -   11. call_duration_mean: The average of the call duration of all        outgoing calls made by this anumber. Note: This is measured in        milliseconds. This may be replaced by incoming_call_duration and        outgoing_call_duration    -   12. call_duration_stddev: The standard deviation of the call        duration of all outgoing calls made by this anumber, measured in        milliseconds

Metrics #4 to #12 above are the predictors (aka features) in the Wangirimodel, while the response is a class label that can take on one of twovalues—“Wangiri” or “Not Wangiri”. It will be appreciated that this isan example of a binary classification problem.

The approach used to solve this problem is to estimate one or moremathematical functions that describe the relationship(s) between thepredictors and the response. The function(s) may be typically estimatedfrom a set of manually labelled data that provides examples of eachclass. These functions, which constitute a model, may then be used topredict the class label (“Wangiri” or “Not Wangiri”) of future data.Labelled training data for the Wangiri class may be obtained using theinvestigator interface which may initially present investigators withanomalous phone numbers to be investigated. The calls that theinvestigator consider to be Wangiri can be labelled and used for thetraining data. The non-Wangiri class training data may be obtained fromrandom sampling of the call data since the vast majority of call datapassing over a telephone network will not be Wangiri calls.

The Wangiri detection model may use a Random Forest classifier. Thisparticular classifier was determined to be preferable after comparingthe performance of several different classifiers on the labelled data.Model hyperparameters (number of estimators and maximum number offeatures) were chosen using a Grid Search using 10 foldcross-validation, with the objective of choosing the parametercombination that maximized the F1-Score. The rationale for choosing tooptimize the F1-Score, rather than the Precision or Recall, is toprovide a balance between false positives and false negatives for theinitial model. Originally, the selection criteria solely consisted ofmaximizing the Precision, but a quick ad-hoc analysis showed that somemodels with slightly lower precisions (−2%) had significantly higherrecalls (+20%). The slightly lower precision, which can result inlegitimate numbers being incorrectly identified as Wangiri numbers canbe addressed by developing additional rules or filters to filter out thelegitimate numbers from the Wangiri numbers. Using a business logiclayer to protect legitimate customers from accidentally being blocked,optimizing on the F1-Score, provides significant recall, whilemitigating any consequence of a slightly lower precision.

The best estimator chosen from the Grid Search has the following scores(over 10 folds):

-   -   Mean F1-Score=0.94; std=0.03    -   Mean Precision=0.96; std=0.04    -   Mean Recall=0.93; std=0.04

The Precision-Recall curve is shown in FIG. 6 . The curve indicates thatthe chosen estimator has good classification performance on the testset.

The labelled dataset used in this modelling process is fairly large andimbalanced (232,477 examples in total; the positive class makes up 1.73%of total). Due to this, training certain ML algorithms turned out to beinfeasible due to very large runtimes. In particular, finding the bestestimator using a grid search (or even a random search) for the SupportVector Machine (SVM) with a non-linear kernel and >5 foldcross-validation took unreasonably long. The Random Forest (RF)classifier was chosen mainly for its computational advantages (as wellas good classification performance in general), such as the fact that itis inherently parallelizable. Further, RF is relatively less sensitiveto the choice of initial values of hyperparameters.

Those skilled in the art will appreciate that it is particularlydesirable to have an end-to-end automated system in place that detectsand blocks Wangiri scammers, as well as other scams, with minimal humanintervention. This blocking may be done automatically; however dependingon the level of false positives that are acceptable to be blocked inerror, additional logic may be used to further filter out possiblelegitimate phone numbers that were incorrectly identified as Wangirinumbers. As an example, this logic may, for each suspected Wangirinumber, verify that:

-   -   The number has not been detected as a Wangiri number above some        threshold number of times, since typically a Wangiri scammer        will not re-use phone numbers;    -   The phone number has some threshold number of international        calls since Wangiri calls typically originate from overseas        numbers;    -   The phone number is similar to other phone numbers recently        detected as Wangiri calls, since typically Wangiri scammers will        often use blocks of sequential numbers.

It will be appreciated that the above logic may be weighted so that theimportance of one test compared to another may be varied as desired.Further, additional or alternative logic may be used to ensure anyincorrectly identified Wangiri numbers are not blocked.

A semi-automated approach may be used to block Wangiri phone numbers, orother scam numbers. In a non-limiting example, the semi-automatedapproach may automatically block verified Wangiri numbers; however useof a human investigator may be used to verify that Wangiri numberspredicted by the detection model are in fact Wangiri numbers. Forexample, the predicted Wangiri numbers may be presented to aninvestigator, possibly along with additional useful information forverifying that the call is a Wangiri call, and the investigator may theneither verify or refute the prediction. In addition, theverified/refuted predictions may also be used as training data tofurther train the prediction models.

The Wangiri detection model may divide model predictions for predictedWangiri calls into two buckets—“Wangiri” and “Manual Review”—for displayto human analysts. This division is based on a general rule that appliesto most Wangiri scam calls, namely that the originating number typicallyoriginates from overseas. To quantify this, it is possible to compute:

I=the ratio of international call records to all call records

The division into the two buckets is then performed by applyingthresholds on the value of I. The “Wangiri” bucket includes numbers thatare with high confidence Wangiri scammers. The “Manual Review” bucketincludes numbers that, although identified by the ML model as Wangiri,are less certain, taking into account the value of I.

The items tagged for manual review are intended to be manuallyinvestigated and labelled by human analysts. With this process in place,it is possible to create an automatic feedback loop where:

-   -   The thresholds used for computing I are re-discovered from these        newly labelled data    -   The model occasionally retrains by including these newly        labelled data

Alternatively, it is possible to make/a feature for the model itself,rather than post-processing and using thresholds on it.

To avoid over fitting during the automatic training, it is possible touse normal business users' numbers and common users' numbers that havenever been flagged (or numbers that are known to be good).

FIG. 3 is a flowchart depicting a method for identifying and blockingphone numbers associated with robocalls and/or scam calls. The method300 begins with pre-processing raw call log records to identifydifferent records that are associated with the same call (302).Depending upon how often the raw call log data is processed, there maybe associated call records that are not in the batch of raw call logscurrently being processed and as such the associated call records mayhave already been pre-processed. The pre-processing may also includedetermining the features used by the different detection models, or thefeature calculation or extraction may be performed after thepre-processing of the raw call log data. Once the raw call log data hasbeen processed, the processed call records may be processed using ananomaly detection model (304) to identify phone numbers exhibitinganomalous behaviour. Phone numbers that are identified as beinganomalous may be stored or otherwise identified for example in ananomalous numbers list 306. The call records may also be processed byone or more models for detecting specific behaviours, such as a Wangiridetection model to identify numbers associated with Wangiri behaviours(308). The phone numbers identified by the model as being Wangirinumbers may be stored or identified, for example in a Wangiri numberslist 310. It is possible to process the anomalous detection model andthe Wangiri detection model, and any other detection models eithersequentially or concurrently. The anomalous numbers and Wangiri numbersmay be verified as scam or undesirable numbers (312). The verificationmay be done using additional rules or logic or may be done by aninvestigator or analyst. After numbers have been verified as scam orassociated with undesirable behaviour, they may be blocked (314). Itwill be appreciated that after blocking the numbers, they may beunblocked (316). For example, it may be desirable to unblock numbers,either in the case of incorrectly blocking a legitimate number or if thescammers have stopped using the number.

FIG. 4 depicts a method of pre-processing raw call log records accordingto a non-limiting aspect of the invention. The method 400 begins withreceiving raw call log records (402). The raw call log records may bereceived and processed in real-time or in batches, for example every 5minutes. For each raw call log record (404) the raw call log record maybe formatted as a call record (406), for example by placing the raw calllog record into a standard format. The call records associated with thesame call, which may include previously processed call records, areidentified (408). The call records identified as being associated withthe same call may be aggregated together (410). The call records may beaggregated into a single record, or a label or other indicator may beadded to all of the associated call records in order to easily identifywhich call records are associated with the same call. Once the callrecords are aggregated, the next call record may be processed (412).After processing the call records, they may be stored and/or passed onto another process (414), such as the prior described detection models.

FIG. 5 depicts a method for unblocking blocked phone numbers accordingto a non-limiting aspect of the invention. The method 500 may be used tounblock numbers that are no longer being used by scammers so that theymay be used for legitimate purposes. It is possible to have otherprocesses to unblock numbers, such as through a customer supportinterface that allows incorrectly blocked numbers to be easilyunblocked. The method 500 may be performed periodically, for exampleevery day, by retrieving a list of blocked numbers (502). For eachblocked number (504), it is determined if the number has been blockedfor a threshold number of days (506), for example 5 days may be used asthe threshold. If it has not yet been blocked long enough (i.e. No at506), the number remains blocked and the next number is processed (512).If the blocked number has been blocked for a threshold number of days(i.e. Yes at 506), it is determined if the blocked number has had no orzero call traffic for the past threshold number of days (508). If therehas been call traffic in the past threshold number of days (i.e. No at508), the number remains blocked and the next number is processed (512).If however there has been no call traffic for the past threshold numberof days (i.e. Yes at 508), the number may be unblocked.

Although certain components and steps have been described above, it iscontemplated that individually described components, as well as steps,may be combined together into fewer components or steps or the steps maybe performed sequentially, non-sequentially or concurrently. Further,although described above as occurring in a particular order, one ofordinary skill in the art having regard to the current teachings willappreciate that the particular order of certain steps relative to othersteps may be changed. Similarly, individual components or steps may beprovided by a plurality of components or steps. One of ordinary skill inthe art having regard to the current teachings will appreciate that thecomponents and processes described herein may be provided by variouscombinations of software, firmware and/or hardware, other than thespecific implementations described herein as illustrative examples.

The techniques of various embodiments may be implemented using software,hardware and/or a combination of software and hardware. Variousembodiments are directed to apparatus, e.g. a node which may be used ina communications system or data storage system. Various embodiments arealso directed to non-transitory machine, e.g., computer, readablemedium, e.g., ROM, RAM, CDs, hard discs, etc., which include machinereadable instructions for controlling a machine, e.g., processor toimplement one, more or all of the steps of the described method ormethods.

Some embodiments are directed to a computer program product comprising acomputer-readable medium comprising code for causing a computer, ormultiple computers, to implement various functions, steps, acts and/oroperations, e.g. one or more or all of the steps described above.Depending on the embodiment, the computer program product can, andsometimes does, include different code for each step to be performed.Thus, the computer program product may, and sometimes does, include codefor each individual step of a method, e.g., a method of operating acommunications device, e.g., a wireless terminal or node. The code maybe in the form of machine, e.g., computer, executable instructionsstored on a computer-readable medium such as a RAM (Random AccessMemory), ROM (Read Only Memory) or other type of storage device. Inaddition to being directed to a computer program product, someembodiments are directed to a processor configured to implement one ormore of the various functions, steps, acts and/or operations of one ormore methods described above. Accordingly, some embodiments are directedto a processor, e.g., CPU, configured to implement some or all of thesteps of the method(s) described herein. The processor may be for usein, e.g., a communications device or other device described in thepresent application.

Numerous additional variations on the methods and apparatus of thevarious embodiments described above will be apparent to those skilled inthe art in view of the above description. Such variations are to beconsidered within the scope.

1. A system for detecting anomalous call behavior in a telephonenetwork, comprising: one or more processors for executing instructions;and at least one memory for storing instructions, which when executed byat least one of the one or more processors configure the system toperform a method comprising: receiving a plurality of raw call logrecords; generating processed call records by identifying andassociating raw call log records relating to a same call; classifyingthe processed call records using a model trained to identify phonenumbers associated with anomalous call behaviour; identifying a phonenumber associated with anomalous call behavior; and labelling the phonenumber identified to be associated with anomalous call behavior.
 2. Thesystem of claim 1, wherein the model trained to identify phone numbersassociated with anomalous call behavior analyzes features of theprocessed call records comprising one or more of: a number of uniqueincoming calls, a number of unique outgoing calls, an incoming callrate, a call duration, a number of unique callees, and a start time of acall.
 3. The system of claim 1, wherein the model trained to identifyphone numbers associated with anomalous call behavior assigns an anomalyscore to each originating phone number in the processed call records,and identifying the phone number associated with anomalous call behavioris based on the anomaly score exceeding an anomaly score threshold. 4.The system of claim 1, wherein the method provided by executing theinstructions further comprises filtering phone numbers with a number ofoutgoing calls less than an outgoing call threshold.
 5. The system ofclaim 1, wherein the model trained to identify phone numbers associatedwith anomalous call behaviour is trained by performance tuning using oneor more sources of verified anomalous phone numbers.
 6. The system ofclaim 1, wherein the model trained to identify phone numbers associatedwith anomalous call behaviour comprises an Isolation Forest model. 7.The system of claim 1, wherein the method provided by executing theinstructions further comprises blocking the phone number determined tobe anomalous from making calls over the telephone network.
 8. The systemof claim 1, wherein the method provided by executing the instructionsfurther comprises: automatically calling the phone number identified tobe associated with anomalous call behavior; simulating a call to thephone number; and recording a portion of the call.
 9. The system ofclaim 1, wherein the method provided by executing the instructionsfurther comprises: generating a user interface including an indicationof the phone number identified to be associated with anomalous callbehavior; providing the generated user interface to an investigator ofthe telephone network operator; and receiving from the user interface aselection including the phone number for blocking.
 10. The system ofclaim 1, wherein the plurality of raw call log records are received froma plurality of network elements that connect the telephone network toend user equipment and to other telephone networks.
 11. A method fordetecting anomalous call behavior in a telephone network comprising:receiving a plurality of raw call log records; generating processed callrecords by identifying and associating raw call log records relating toa same call; classifying the processed call records using a modeltrained to identify phone numbers associated with anomalous callbehaviour; identifying a phone number associated with anomalous callbehavior; and labelling the phone number identified to be associatedwith anomalous call behavior.
 12. The method of claim 11, wherein themodel trained to identify phone numbers associated with anomalous callbehavior analyzes features of the processed call records comprising oneor more of: a number of unique incoming calls, a number of uniqueoutgoing calls, an incoming call rate, a call duration, a number ofunique callees, and a start time of a call.
 13. The method of claim 11,wherein the model trained to identify phone numbers associated withanomalous call behavior assigns an anomaly score to each originatingphone number in the processed call records, and identifying the phonenumber associated with anomalous call behavior is based on the anomalyscore exceeding an anomaly score threshold.
 14. The method of claim 11,further comprising filtering phone numbers with a number of outgoingcalls less than an outgoing call threshold.
 15. The method of claim 11,wherein the model trained to identify phone numbers associated withanomalous call behaviour is trained by performance tuning using one ormore sources of verified anomalous phone numbers.
 16. The method ofclaim 11, wherein the model trained to identify phone numbers associatedwith anomalous call behaviour comprises an Isolation Forest model. 17.The method of claim 11, further comprising blocking the phone numberdetermined to be anomalous from making calls over the telephone network.18. The method of claim 11, further comprising: automatically callingthe phone number identified to be associated with anomalous callbehavior; simulating a call to the phone number; and recording a portionof the call.
 19. The method of claim 11, further comprising: generatinga user interface including an indication of the phone number identifiedto be associated with anomalous call behavior; providing the generateduser interface to an investigator of the telephone network operator; andreceiving from the user interface a selection including the phone numberfor blocking.
 20. The method of claim 11, wherein the plurality of rawcall log records are received from a plurality of network elements thatconnect the telephone network to end user equipment and to othertelephone networks.